Fast and Accurate Language Detection in Short Texts using Contextual Entropy
نویسندگان
چکیده
منابع مشابه
Fast and Accurate Language Detection in Short Texts using Contextual Entropy
In this work we address the problem of Language identification (LI) on short segments of text. The central idea is to compute the entropy of a document in different contexts and assign it to the category where the entropy is maximal. Only word distributions are needed for the task, no other training is done. For LI the contexts are the languages, and classification is done by just evaluating th...
متن کاملChange-Detection Using Contextual Information and Fuzzy Entropy Principle
This paper presents an unsupervised change detection method for computing the amount of changes that have occurred within an area by using remotely sensed technologies and fuzzy modeling. The discussion concentrates on the formulation of a standard procedure that, using the concept of fuzzy sets and fuzzy logic, can define the likelihood of changes detected from remotely sensed data. The fuzzy ...
متن کاملStatistical Language Identification of Short Texts
Although correctly identifying the language of short texts should prove useful in a large number of applications, few satisfactory attemps are reported in the literature. In this paper we describe a Naive Bayes Classifier that performs well on very short texts, as well as the corpus that we created from movie subtitles for training it. Both the corpus and the algorithm are available under the G...
متن کاملOff-topic essay detection using short prompt texts
Our work addresses the problem of predicting whether an essay is off-topic to a given prompt or question without any previouslyseen essays as training data. Prior work has used similarity between essay vocabulary and prompt words to estimate the degree of ontopic content. In our corpus of opinion essays, prompts are very short, and using similarity with such prompts to detect off-topic essays y...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Research in Computing Science
سال: 2015
ISSN: 1870-4069
DOI: 10.13053/rcs-90-1-27